98 research outputs found
VST++: Efficient and Stronger Visual Saliency Transformer
While previous CNN-based models have exhibited promising results for salient
object detection (SOD), their ability to explore global long-range dependencies
is restricted. Our previous work, the Visual Saliency Transformer (VST),
addressed this constraint from a transformer-based sequence-to-sequence
perspective, to unify RGB and RGB-D SOD. In VST, we developed a multi-task
transformer decoder that concurrently predicts saliency and boundary outcomes
in a pure transformer architecture. Moreover, we introduced a novel token
upsampling method called reverse T2T for predicting a high-resolution saliency
map effortlessly within transformer-based structures. Building upon the VST
model, we further propose an efficient and stronger VST version in this work,
i.e. VST++. To mitigate the computational costs of the VST model, we propose a
Select-Integrate Attention (SIA) module, partitioning foreground into
fine-grained segments and aggregating background information into a single
coarse-grained token. To incorporate 3D depth information with low cost, we
design a novel depth position encoding method tailored for depth maps.
Furthermore, we introduce a token-supervised prediction loss to provide
straightforward guidance for the task-related tokens. We evaluate our VST++
model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD
benchmark datasets. Experimental results show that our model outperforms
existing methods while achieving a 25% reduction in computational costs without
significant performance compromise. The demonstrated strong ability for
generalization, enhanced performance, and heightened efficiency of our VST++
model highlight its potential
SAMN: A Sample Attention Memory Network Combining SVM and NN in One Architecture
Support vector machine (SVM) and neural networks (NN) have strong
complementarity. SVM focuses on the inner operation among samples while NN
focuses on the operation among the features within samples. Thus, it is
promising and attractive to combine SVM and NN, as it may provide a more
powerful function than SVM or NN alone. However, current work on combining them
lacks true integration. To address this, we propose a sample attention memory
network (SAMN) that effectively combines SVM and NN by incorporating sample
attention module, class prototypes, and memory block to NN. SVM can be viewed
as a sample attention machine. It allows us to add a sample attention module to
NN to implement the main function of SVM. Class prototypes are representatives
of all classes, which can be viewed as alternatives to support vectors. The
memory block is used for the storage and update of class prototypes. Class
prototypes and memory block effectively reduce the computational cost of sample
attention and make SAMN suitable for multi-classification tasks. Extensive
experiments show that SAMN achieves better classification performance than
single SVM or single NN with similar parameter sizes, as well as the previous
best model for combining SVM and NN. The sample attention mechanism is a
flexible module that can be easily deepened and incorporated into neural
networks that require it
MaxMin-L2-SVC-NCH: A New Method to Train Support Vector Classifier with the Selection of Model's Parameters
The selection of model's parameters plays an important role in the
application of support vector classification (SVC). The commonly used method of
selecting model's parameters is the k-fold cross validation with grid search
(CV). It is extremely time-consuming because it needs to train a large number
of SVC models. In this paper, a new method is proposed to train SVC with the
selection of model's parameters. Firstly, training SVC with the selection of
model's parameters is modeled as a minimax optimization problem
(MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization
problem of finding the closest points between two normal convex hulls
(L2-SVC-NCH) while the maximization problem is an optimization problem of
finding the optimal model's parameters. A lower time complexity can be expected
in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is
then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a
projected gradient algorithm (PGA) while the maximization problem is solved by
a gradient ascent algorithm with dynamic learning rate. To demonstrate the
advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the
PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO
algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed
that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide
more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the
classical parameter selection models on public datasets show that
MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the
test accuracy is not lost to the classical models. It indicates that
MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend
MaxMin-L2-SVC-NCH as a preferred model for SVC task
YOLOv5-TS: Detecting traffic signs in real-time
Traffic sign detection plays a vital role in assisted driving and automatic driving. YOLOv5, as a one-stage object detection solution, is very suitable for Traffic sign detection. However, it suffers from the problem of false detection and missed detection of small objects. To address this issue, we have made improvements to YOLOv5 and subsequently introduced YOLOv5-TS in this work. In YOLOv5-TS, a spatial pyramid with depth-wise convolution is proposed by replacing maximum pooling operations in spatial pyramid pooling with depth-wise convolutions. It is applied to the backbone to extract multi-scale features at the same time prevent feature loss. A Multiple Feature Fusion module is proposed to fuse multi-scale feature maps multiple times with the purpose of enhancing both the semantic expression ability and the detail expression ability of feature maps. To improve the accuracy in detecting small even extra small objects, a specialized detection layer is introduced by utilizing the highest-resolution feature map. Besides, a new method based on k-means++ is proposed to generate stable anchor boxes. The experiments on the data set verify the usefulness and effectiveness of our work
Zero-Shot Rumor Detection with Propagation Structure via Prompt Learning
The spread of rumors along with breaking events seriously hinders the truth
in the era of social media. Previous studies reveal that due to the lack of
annotated resources, rumors presented in minority languages are hard to be
detected. Furthermore, the unforeseen breaking events not involved in
yesterday's news exacerbate the scarcity of data resources. In this work, we
propose a novel zero-shot framework based on prompt learning to detect rumors
falling in different domains or presented in different languages. More
specifically, we firstly represent rumor circulated on social media as diverse
propagation threads, then design a hierarchical prompt encoding mechanism to
learn language-agnostic contextual representations for both prompts and rumor
data. To further enhance domain adaptation, we model the domain-invariant
structural features from the propagation threads, to incorporate structural
position representations of influential community response. In addition, a new
virtual response augmentation method is used to improve model training.
Extensive experiments conducted on three real-world datasets demonstrate that
our proposed model achieves much better performance than state-of-the-art
methods and exhibits a superior capacity for detecting rumors at early stages.Comment: AAAI 202
Recommended from our members
Fiber Vector Bend Sensor Based on Multimode Interference and Image Tapping
A grating-less fiber vector bend sensor is demonstrated using a standard single mode fiber spliced to a multimode fiber as a multimode interference device. The ring-shaped light intensity distribution at the end of the multimode fiber is subject to a vector transition in response to the fiber bend. Instead of comprehensive imaging processing for the analysis, the image can be tapped out by a seven-core fiber spliced to the other end of the multimode fiber. The seven-core fiber is further guided to seven single mode fibers via a commercial fan-out device. By comparing the relative light intensities received at the seven outputs, both the bend radius and its direction can be determined. Experiment has shown that a slight bend displacement of 10 µm over a 1.2-cm-long multimode fiber in the X direction (bend angle of 0.382 ◦ ) causes a distinctive power imbalance of 4.6 dB between two chosen outputs (numbered C4 and C7). For the same displacement in the Y direction, the power ratio between the previous two outputs C4 and C7 remains constant, while the imbalance between another pair (C3 and C4) rises significantly to 7.0 dB. © 2019 by the authors. Licensee MDPI, Basel, Switzerland
Scalable mode division multiplexed transmission over a 10-km ring-core fiber using high-order orbital angular momentum modes
We propose and demonstrate a scalable mode division multiplexing scheme based on orbital angular momentum modes in ring core fibers. In this scheme, the high-order mode groups of a ring core fiber are sufficiently de-coupled by the large differential effective refractive index so that multiple-input multiple-output (MIMO) equalization is only used for crosstalk equalization within each mode group. We design and fabricate a graded-index ring core fiber that supports 5 mode groups with low inter-mode-group coupling, small intra-mode-group differential group delay, and small group velocity dispersion slope over the C-band for the high-order mode groups. We implement a two-dimensional wavelength- and mode-division multiplexed transmission experiment involving 10 wavelengths and 2 mode groups each with 4 OAM modes, transmitting 32 GBaud Nyquist QPSK signals over all 80 channels. An aggregate capacity of 5.12 Tb/s and an overall spectral efficiency of 9 bit/s/Hz over 10 km are realized, only using modular 4x4 MIMO processing with 15 taps to recover signals from the intra-mode-group mode coupling. Given the fixed number of modes in each mode group and the low inter-mode-group coupling in ring core fibres, our scheme strikes a balance in the trade-off between system capacity and digital signal processing complexity, and therefore has good potential for capacity upscaling at an expense of only modularly increasing the number of mode-groups with fixed-size (4x4) MIMO blocks
The ALMA-QUARKS survey: -- I. Survey description and data reduction
This paper presents an overview of the QUARKS survey, which stands for
`Querying Underlying mechanisms of massive star formation with ALMA-Resolved
gas Kinematics and Structures'. The QUARKS survey is observing 139 massive
clumps covered by 156 pointings at ALMA Band 6 ( 1.3 mm). In
conjunction with data obtained from the ALMA-ATOMS survey at Band 3
( 3 mm), QUARKS aims to carry out an unbiased statistical
investigation of massive star formation process within protoclusters down to a
scale of 1000 au. This overview paper describes the observations and data
reduction of the QUARKS survey, and gives a first look at an exemplar source,
the mini-starburst Sgr B2(M). The wide-bandwidth (7.5 GHz) and
high-angular-resolution (~0.3 arcsec) observations of the QUARKS survey allow
to resolve much more compact cores than could be done by the ATOMS survey, and
to detect previously unrevealed fainter filamentary structures. The spectral
windows cover transitions of species including CO, SO, ND, SiO,
H, HCO, CHCN and many other complex organic molecules,
tracing gas components with different temperatures and spatial extents. QUARKS
aims to deepen our understanding of several scientific topics of massive star
formation, such as the mass transport within protoclusters by (hub-)filamentary
structures, the existence of massive starless cores, the physical and chemical
properties of dense cores within protoclusters, and the feedback from already
formed high-mass young protostars.Comment: 9 figures, 4 tables, accepted by RA
- …